We explore the partitioning of Mankiw, Romer and Weil’s dataset in “A Contribution to the Empirics of Economic Growth” through the use of dimensionality reduction/clustering algorithms and discover a few puzzles along the way. The paper’s results seem to be robust to different partitioning methods, as the authors suggest, although we don’t test this rigorously. Results for the Augmented Solow Model are also covered although aren’t displayed here and can be viewed using the original Rmarkdown scripts on GitHub.

Exploring Subsets

MRW estimate the Solow and Augmented Solow model on three different subsets of the data. They use a ‘Non-Oil’ group of countries; an ‘Intermediate’ group and an ‘OECD’ group. However, group membership isn’t mutually exclusive which makes visualisation a little tricky as we’ll see in a bit. Furthermore, there are 23 observations that don’t match any of these criteria in the dataset which is a little bizarre - the authors don’t really have observations to spare when starting at 121.

number country N I O 1960 1985 gdp_ch working_age_pop_ch s school n_g_d
91 Barbados 0 0 0 3165 NA 4.8 NA 0.195 12.1 NA
111 Guyana 0 0 0 2761 NA 1.1 NA 0.324 11.7 NA
114 Surinam 0 0 0 3226 NA 4.5 NA 0.194 8.1 NA
118 Fiji 0 0 0 3634 NA 4.2 NA 0.206 8.1 NA
13 Gabon 0 0 0 1307 5350 7.0 1.4 0.221 2.6 0.064
14 Gambia, The 0 0 0 799 NA 3.6 NA 0.181 1.5 NA

Whilst a lot of these observations are missing values, nine of them have enough data to be included in the model and an almost 10% increase in sample size, even from a more heterogeneous group such as these, would probably improve the power of their tests.

Moving on from these excluded observations we now turn to whether the partitioning, or at least the semi/sort-of partitioning, is justified in the data.

Clustering

The ideal way to visualise the dummy variable labelling would be with a Venn diagram but unfortunately there isn’t a great deal of Venn diagram material in R; instead we can exploit the fact that although there are six possible dummy variable combinations the dataset only contains four, unique and mutually exclusive factors. This becomes clear if we plot the O, I, N space:

It makes sense to me to replace MRW’s original dummy structure with these new, mutually exclusive dummies. Instinctively, using non-mutually exclusive dummies to partition the dataset with which to run the regressions seems wrong to me - we’re oversampling some datapoints in my opinion.1

Here we encode the four new dummies and convert them from separate dummy columns into a single column of factors named club. The conversion is a little finnicky and requires converting the dummy columns to a matrix and post multiplying by a column of 1s to get a single factor column:

Now we can perform the clustering analysis - the above steps weren’t strictly necessary but the distinct factors let us visualise different groups with colour mappings easier.

Here we perform t-SNE which seems to be all the rage in the clustering scene right now. First, we create a function to perform the clustering and return a tibble with both the clustered co-ordinates and the corresponding features:

Next, we plot the results in two dimensions and use the new factor we created as a colour mapping:

Whilst the plot on the left makes a reasonable argument for partitioning the dataset I’d argue that a lot of this comes from the aesthetics size and colour we’ve supplied ggplot. Therefore, on the right is the same plot stripped of this additional information.

t-SNE uses a number of hyper-parameters that need to be tuned - the above plots use the default values of Rtsne. Rather than using some loss metric as is traditional, for our purposes it’s recommended to judge optimal parameter values using visual clarity/overall aesthetic. Therefore, below we map perplexity values from 1 to 30 and use gganimate to help judge the optimal perplexity value. Personally, I think a value of around nine or ten looks best:

I think the following is a pretty good illustration of simplicity’s importance in communicating ideas graphically:

Another common dimensionality reduction or clustering approach is to use principal component analysis. Below we plot PCA using ggfortify’s addition to autoplot:

Mankiw, Romer and Weil decide to partition the dataset using their own domain knowledge and theory. This is an entirely valid way to make decisions and ideally, but not necessarily, the data would support their decision. I think in this case whilst the evidence using both PCA and t-SNE isn’t clear cut it can definitely be argued either way.

New Dummy Model

Now we move on and compare our new dummy results with the originals.

Loading the original results and running the ‘new models’:

Here we collect and compare the results using functions from my very first post:

term estimate subset type
(Intercept) 26.9912257 Oil New
log(s) 1.8019705 Oil New
log(n_g_d) 5.9246642 Oil New
(Intercept) 10.7498808 Intermediate New
log(s) 1.0834370 Intermediate New
log(n_g_d) 0.2858952 Intermediate New

When we ignore the ‘Oil’ column the results are broadly similar although it’s worth noting that standard errors and confidence intervals aren’t displayed here:

Whilst the sample size reduction does increase the standard errors the difference doesn’t look to be huge - it’s hard to say whether this will have a large effect on the conclusions drawn from the data from this graph alone.

Conclusion

We’ve explored some of the modelling assumptions of Mankiw, Romer and Weil’s “A Contribution to the Empirics of Economic Growth” and found that subsetting the data along different lines doesn’t seem to largely change any results although we don’t examine this rigorously.

There’s supportive evidence in the data for splitting up observations although both the PCA and t-SNE, I’d argue, are by no means definitive.

Re-partitioning the data into mutually exclusive subsets is, in my opinion, an improvement on the original method.The last few graphs clearly highlight MRW were most likely justified in excluding economies largely reliant on oil.


  1. I think this boils down to the question: What population parameter are the authors trying to uncover? If the aim is to identify the parameter for each group e.g. for policy reasons or because they’re of interest in their own right I don’t see a problem. However, if the aim is to empirically identify the Solow model and these subsets are just drawn from some population, global hyper-parameter I think MRW’s could be improved upon. Finally, regardless of the above, why not include the dummies and their interactions in a pooled model and at the very least include our nine excluded observations?